Efficient fault diagnosis using probing

نویسندگان

  • Irina Rish
  • Mark Brodie
  • Sheng Ma
چکیده

In this paper, we address the problem of efficient diagnosis in real-time systems capable of on-line information gathering, such as sending ”probes” (i.e., test transactions, such as ”traceroute” or ”ping”) in order to identify network faults and evaluate performance of distributed computer systems. We use a Bayesian network to model probabilistic relations between the problems (faults, performance degradation) and symptoms (probe outcomes). Due to intractability of exact probabilistic inference in large systems, we investigated approximation techniques, such as a local-inference scheme called mini-buckets(Dechter & Rish 1997). Our empirical study demonstrates advantages of local approximations for large diagnostic problems: the approximation is very efficient and ”degrades gracefully” with noise; also, the approximation error gets smaller on networks with higher confidence (probability) of the exact diagnosis. Since the accuracy of diagnosis depends on how much information the probes can provide about the system states, the second part of our work is focused on the probe selection task. Small probe sets are desirable in order to minimize the costs imposed by probing, such as additional network load and data management requirements. Our results show that, although finding the optimal collection of probes is expensive for large networks, efficient approximation algorithms can be used to find a nearly-optimal set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient probe selection algorithms for fault diagnosis

Increase in the network usage for more and more performance critical applications has caused a demand for tools that can monitor network health with minimum management traffic. Adaptive probing has the potential to provide effective tools for end-to-end monitoring and fault diagnosis over a network. Adaptive probing based algorithms adapt the probe set to localize faults in the network by sendi...

متن کامل

Using Adaptive Probing for Real-Time Problem Diagnosis in Distributed Computer Systems

In this work, we focus on cost-efficient techniques for realtime diagnosis in distributed systems that allow an adaptive, on-line selection and execution of appropriate measurements (tests). Particularly, one of our applications concerns fault diagnosis in distributed computer systems and networks by using test transactions, or probes (e.g., ”traceroute” or ”ping” commands). The key efficiency ...

متن کامل

Probabilistic Fault Diagnosis Using Adaptive Probing

Past research on probing-based network monitoring provides solutions based on preplanned probing which is computationally expensive, is less accurate, and involves a large management traffic. Unlike preplanned probing, adaptive probing proposes to select probes in an interactive manner sending more probes to diagnose the observed problem areas and less probes in the healthy areas, thereby signi...

متن کامل

Strategies for Problem Determination using Probing

As distributed systems continue to grow in size and complexity, scalable and cost-efficient techniques are needed for performing tasks such as problem determination and fault diagnosis. In this paper, we address these tasks using probes, or test transactions, which replace traditional “passive” event-correlation techniques with a more active, real-time information-gathering approach. We provide...

متن کامل

Efficient Probe Station Placement and Probe Set Selection for Fault Localization

Network fault management has been a focus of research activity with more emphasis on fault localization – zero down exact source of a failure from set of observed failures. Fault diagnosis is a central aspect of network fault management. Since faults are unavoidable in communication systems, their quick detection and isolation is essential for the robustness, reliability, and accessibility of a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002